THIS PAGE AIMS TO ADD SUPPLEMENTARY INFORMATION FOR MY POSTULATION FOR CEBEM’S MOBILITY FELOWSHIP (THIRD CALL, 2024).

A conceptual approach to the importance of fusogens from an evolutionary approach is given. Later, proposed approaches to perform remote homology searches are further detailed (with some schematic figures), as well as preliminary results regarding the analysis of two families of proteins related to fusogens (fusexins and dynamin-like proteins) are presented.


Membrane fusion: a fundamental process

Life as we know it heavily depends on cellular membranes. The very existence of cells requires the compartmentalization provided by these membranes. The emergence and evolution of different cell types, from the appearance of the first cells (estimated +3000 million years ago) to the present, largely base themselves on the dynamics of interaction and fusion acquired by these cells and the various compartments delimited within them. A variety of key phenomena depend on membrane fusion, such as intra- and extracellular transport (enabling the acquisition and expulsion of nutrients, or cell communication), muscle tissue formation, or the invasion of cells by enveloped viruses. Notably, sex, a defining characteristic of eukaryotes, is fundamentally based on the fusion of gametes.

These events are far from being completely random and spontaneous: proteins called fusogens are part of the ad hoc cellular machinery that catalyzes these thermodynamically unfavorable fusions. The specificity provided by these cellular fusogens underlies the incessant, and at first glance chaotic, dance of membranes that underpins life.

Searching for fusogens using computational methods

Recent studies from our research group and collaborators have allowed us to predict and experimentally validate a eukaryotic fusogen responsible for the fusion of gametes. We found that this fusogen is homologous to the class II fusogens of enveloped viruses, revealing an ancient exchange between viruses and eukaryotes.

This finding has important implications for formulating evolutionary questions regarding the origin of sex, a turning point in the evolution of cellular complexity. Moreover, our results today show that these fusogens exist in Archaea, enriching the landscape. Was sex an invention of archaea, viruses, or eukaryotes? What role do archaeal fusexins play? These are questions that remain unclear to date.

The landscape is even more enigmatic in other cases. For example, sexual fusogens in fungi or vertebrates are unknown. Other cellular fusogens described in the literature also lack a thorough study both in terms of their distribution and their evolutionary dynamics, which is indispensable to understand the evolution of some of the biological processes mentioned earlier.

The appearance of new tools for the inference of structural models such as AlphaFold2 has been revolutionary in this regard, as they allow the detection of remote homologs from the alignment of structural models.

Proposal for iterative search of fusogens

A comprehensive search of homologues is always needed to be able to delucidate the evolutionary history of any family of protenis. Of course, this is possible only with: i) a diverse database to search against and ii) an efficient algorithm to perform the search.

The following database assembly is proposed in order to get a taxonomically diverse set of eukaryotic structures to search for families of fusogens using structures, illustrated by the following figure

Structural database assembly to obtain a phylogenetically diverse set of eukaryotic protein structures. Icons taken from bioicons (www.bioicons.com).
Structural database assembly to obtain a phylogenetically diverse set of eukaryotic protein structures. Icons taken from bioicons (www.bioicons.com).

An iterative procedure, based on this structural database and the FoldSeek Clusters DB (publicly available) is proposed in order to get the set of homologues in each case (figure bellow)

Iterative procedure proposed to detect remote homologues for fusexins and fusogenic DLPs employing structural information. Icons taken from bioicons (www.bioicons.com).
Iterative procedure proposed to detect remote homologues for fusexins and fusogenic DLPs employing structural information. Icons taken from bioicons (www.bioicons.com).

Advances in the study of fusexins

A collection of fusexin homologs has been collected from previous work. A structural comparison of them clearly shows their structural homology, as shown in the following figure

Structural comparison of 3D models of archaeal, eukaryotic (HAP2/GCS1) and viral (class II fusexin ectodomains). Structural models were downloaded from PDB or inferred with AlphaFold2 (employing ColabFold). Flexible alignments were performed with FATCAT and possible structural homology was determined with TM-align (TM-score >= 0.5). Clear structural homology is observed for the Eukarya, Virus and Archaea models, with intradomain clustering showing higher scores. Domains shown in color at the edges of the heatmap.
Structural comparison of 3D models of archaeal, eukaryotic (HAP2/GCS1) and viral (class II fusexin ectodomains). Structural models were downloaded from PDB or inferred with AlphaFold2 (employing ColabFold). Flexible alignments were performed with FATCAT and possible structural homology was determined with TM-align (TM-score >= 0.5). Clear structural homology is observed for the Eukarya, Virus and Archaea models, with intradomain clustering showing higher scores. Domains shown in color at the edges of the heatmap.

But what can we say about the whole superfamily evolution? A first approach, using FoldTree to infer a phylogenetic tree from paired structural distances and rooting the resulting tree using an algorithm named MAD (minimum ancestor deviation) shows the following

Rooted phylogenetic tree inferred with structural data using the Minimum Ancestry Deviation (MAD) algorithm. Paired structural comparison scores (TM-scores) were used to calculate a distance matrix (distance metric: 1-TMscore) for fusexin ectodomains. A phylogeny was inferred by minimum evolution method, which was rooted with the MAD algorithm. The root corresponds to the point where the AD score is minimized (color in the branches). The results do not allow discerning between a viral or a cellular origin for the family.
Rooted phylogenetic tree inferred with structural data using the Minimum Ancestry Deviation (MAD) algorithm. Paired structural comparison scores (TM-scores) were used to calculate a distance matrix (distance metric: 1-TMscore) for fusexin ectodomains. A phylogeny was inferred by minimum evolution method, which was rooted with the MAD algorithm. The root corresponds to the point where the AD score is minimized (color in the branches). The results do not allow discerning between a viral or a cellular origin for the family.

where neither a cellular or viral origin for this fusogen family can be inferred. Further work using better approaches with bigger sets of homologues may help to further ellucidate this evolutionary history, leading to possible clues about how eukaryotic fusexins (and possibly part of eukaryotic sex) evolved.

Interestingly, a classical approach inferring phylogenies with sequence-based methods (restricted to Archaea and Eukarya) shows support for an archaeal origin of eukaryotic sex

Phylogenetic tree inferred from archaeal and eukaryotic fusexin sequence data. Fusexin ectodomains were aligned and trimmed, and the resulting alignment was recoded following the Hanada scheme, encoding conservative and radical amino acid changes with different characters. Phylogenetic inference was done by maximum likelihood method, using non-reversible models in order to establish polarization for character changes. Root support values are shown for each branch (lower values: blue, higher values: pink). The results suggest an Archaean origin for fusexins. Due to the loss of phylogenetic signal only eukaryotic and archaeal sequences were used.
Phylogenetic tree inferred from archaeal and eukaryotic fusexin sequence data. Fusexin ectodomains were aligned and trimmed, and the resulting alignment was recoded following the Hanada scheme, encoding conservative and radical amino acid changes with different characters. Phylogenetic inference was done by maximum likelihood method, using non-reversible models in order to establish polarization for character changes. Root support values are shown for each branch (lower values: blue, higher values: pink). The results suggest an Archaean origin for fusexins. Due to the loss of phylogenetic signal only eukaryotic and archaeal sequences were used.

Advances in the study of dynamin-like proteins

This interesting family posses members that are known to be mitochondrial fusogens in eukaryotes. Bacterial members have been discovered in the last decades, but their functions remain in many cases unknown (e.g., it has been postulated that might have a role in membrane remodelling, or in fusing lamellar structures in Cyanobacteria).

What is the taxonomic distribution of this family? How diverse are their members in Eukarya and Prokarya? No systematic approach to study their whole evolutionary story has being made to data, probably due to the amount of sequence divergence. A first approach shows the following preliminary results

Presence of putative homologs to fusogens of the DLPs family in eukaryote supergroups defined by Burki et al. (2020) using different approaches. Presence in turn is marked in previous study by Sinha & Manoj (2019) (“R”), as well as the representation of the supergroups in the UniProt database (used as input by Foldseek Clusters).
Presence of putative homologs to fusogens of the DLPs family in eukaryote supergroups defined by Burki et al. (2020) using different approaches. Presence in turn is marked in previous study by Sinha & Manoj (2019) (“R”), as well as the representation of the supergroups in the UniProt database (used as input by Foldseek Clusters).
Distribution of putative prokaryotic homologs for DLPs projected into a previously published prokaryotic tree with taxonomic diversity. Both structural and sequence putative homologs are marked in the figure.
Distribution of putative prokaryotic homologs for DLPs projected into a previously published prokaryotic tree with taxonomic diversity. Both structural and sequence putative homologs are marked in the figure.

Both for eukaryotic and prokaryotic members a vast amount of unknown putative homologs appear to be discovered using either profile-based or structural searches. Further study is needed to ellucidate a complete set of homologues for these family, as well as performing phylogenetic inference in order to delimit a clade for known fusogens (and probable novel fusogens) and study their evolutionary dynamics.

Interstingly, a paired structural comparison between some collected homologs (using structures of known members of the family as seeds for structural searches) shows a clear delimitation of two possible clades

Paired structural comparison of cluster members from the Foldseek Clusters database. The bitscore value reported by Foldseek for each structural alignment is shown. Structures considered in six clusters involving DLPs with structures already reported in the literature were used for the analysis: bacterial DLPs (from Nostoc punctiforme and Synechocystis sp. PCC 6803), outer membrane mitochondrial fusion mitofusins (yeast Fzo1, vertebrate Mfn1) and inner membrane mitofusins (yeast Mgm1 and vertebrate OPA1). The domain (Eukarya or Archaea) with which each structure is annotated in the UniProt database is also indicated.
Paired structural comparison of cluster members from the Foldseek Clusters database. The bitscore value reported by Foldseek for each structural alignment is shown. Structures considered in six clusters involving DLPs with structures already reported in the literature were used for the analysis: bacterial DLPs (from Nostoc punctiforme and Synechocystis sp. PCC 6803), outer membrane mitochondrial fusion mitofusins (yeast Fzo1, vertebrate Mfn1) and inner membrane mitofusins (yeast Mgm1 and vertebrate OPA1). The domain (Eukarya or Archaea) with which each structure is annotated in the UniProt database is also indicated.

Interestingly, this can lead to at least two hypotheses for the origin of mitochondrial fusogens. Further delucidation of this evolutionary story needs, again, to delimit a final set of homologues and the usage of structural-based phylogenetics, due to the amount of divergence between the members of this intriguing family.